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FOREWORD 


In  this  report,  a  novel  set  of  affine  invariant  features  and 
an  accompanying  neural  network  are  described  for  performing  two- 
dimensional  pattern  recognition,  specifically,  handwritten 
character  recognition.  The  results  achieved  by  the  network  on 
single  characters,  distinct  words,  and  similar  words  are 
presented.  The  Concepts  and  Technologies  Branch  (G42)  is 
applying  artificial  neural  systems  technology  to  areas  which 
require  fast,  accurate,  and  robust  recognition  and 
classification. 

This  study  was  partially  funded  by  the  Office  of  Naval 
Research,  Summer  Faculty  Program. 

This  technical  report  was  reviewed  by  Dr.  Kenneth  F.  Caudle, 
Head  of  the  Advanced  Weapons  Division. 


Approved  by: 


KURT  F.  MUELLER 
Deputy  Department  Head 
Weapon  Systems  Department 
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INTRODUCTION 


The  problem  of  recognizing  handwritten  characters  and/or  words 
is  one  that  has  delighted,  intrigued,  and  puzzled  researchers  in 
Artificial  Intelligence  for  many  years.  The  solution  to  the  problem 
could  result  in  untold  savings  in  time  and  money  to  business, 
industry,  and  government. 

Many  significant  advances  have  been  made  in  this  area 
including  the  development  of  the  Neocognitron-a  hierarchichal 
neural  network  organized  like  the  visual  cortex  and  consistent  with 
the  visual  system  described  by  Hubei  and  Wiesel.i  The  network  has 
the  advantages  of  being  affine  invariant  and  of  being  relatively 
insensitive  to  noise  and  distortion.  It  has  the  disadvantage  of 
being  one  of  the  most  complicated  neural  networks  ever  devised. 

Other  approaches  have  been  taken  and  even  some  commercially 
available  software  products  have  been  developed.  It  is  probably 
still  correct  to  say,  however,  that  the  final  solution  has  not  been 
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achieved.  The  problem  of  segmentation  looms  large  over  any  attempt 
at  solution.  A  new  approach — combining  features  initially 
developed  for  target  recognition  with  a  simple  neural  network — will 
be  described.  While  the  approach  taken  does  not  purport  to  solve 
the  problems  of  segmentation  and  proliferation  of  correct  forms 
described  above,  it  will  be  shown  to  have  the  advantages  of 
simplicity  and  accuracy. 


APPROACH 

Fuller  and  Farsaie2  were  originally  described  and  used  a  set 
of  new  features  (Theta  Neighbors)  for  identification  of  and 
differentiation  among  targets.  We  formalize  the  notion  of  Theta 
Neighbors  as  follows: 

Let  I  represent  a  digitized  image  contained  in  an  n  x  m  pixel 
grid.  Let  (X,Y)  denote  the  center  of  mass  of  I.  Without  loss  of 
generality,  we  may  assume  that  (X,Y)  =  (0,0)  since  a  simple 
translation  would  accomplish  the  desired  result.  Let  theta  be  a 
fixed  angle  -  0  <  theta  <  360  degrees-  and  let  (i,j)  and  (l,m)  be 
two  pixels  in  the  n  x  m  pixel  grid.  We  say  that  (i,j)  and  (l,m) 
are  Theta  Neighbors  if  and  only  if  for  some  angles  phi[l]  and 
phi [ 2 ] 

1.  (i,j)  and  (l,m)  are  both  pixels  in  I 

2.  (i,j)  =  (R*cos (phi [ 1 ] ) , R*sin (phi [ 1 J ) 

(l,m)  =  (R*cos(phi[2] ) ,R*sin(phi[2] ) 
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Thus  (i,j)  and  (l,m)  lie  on  the  same  circle  of  radius  R 
about  the  origin 

3.  ABS (phi [ 1] -phi [2 ] )  =  theta  where  ABS  denotes  absolute 

value. 

A  set  of  features  may  be  constructed  as  follows: 

Let  { theta[l] ,theta[2] , . . . ,theta[n] }  be  a  set  of  distinct  angles 
with  0  <  theta[i]  <  360  for  i  =  l,2,...,n.  Let  N(theta[i])  denote 
the  number  of  pixels  in  I  which  have  let  theta [i]  neighbors  and  let 
F[i]  =  N (theta [i] ) /A  where  A  is  the  area  of  the  circle  of  minimum 
radius  needed  to  enclose. 

The  strength  of  Theta  Neighbors  in  identifying  patterns  is 
summarized  in  the  following  propositions: 

Proposition  1:  Let  I  be  as  above  and  let  I(theta[i])  be  the 
image  obtained  when  I  is  rotated  by  theta [i]  about  its  center 
of  mass.  Then  N (theta [i])  may  be  obtained  by  simply  counting 
the  pixels  in  the  intersection  of  I  and  I(theta[i]). 
Proposition  2:  The  set  of  features  (F[l] ,F[2] , . . . ,F[n] } 
described  above  is  affine  invariant. 

We  omit  formal  proofs  of  propositions  1  and  2  and  we  simply 
state  that  rotation  and  translation  invariance  of  the  features  is 
obvious  since  Theta  Neighbors  are  determined  with  respect  to  the 
center  of  mass  of  I.  Scale  invariance  is  apparent  if  it  is 
recognized  that  N(theta[i])  is  the  area  of  a  particular  subset  of 
a  circle.  As  the  radius  of  this  circle  increases  or  decreases, 
N[theta[i])  will  increase  or  decrease  proportionately  with  the 
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square  of  the  radius ,  or  equivalently  with  the  area. 

Perhaps  more  important  than  the  calculation  of  the  features 
just  described  is  the  motivation  for  choosing  them  in  the  first 
place.  This  motivation  comes  from  the  political  cartoonist  who — 
when  constructing  a  caricature  of  a  famous  person — chooses  and  then 
emphasizes  a  prominent  feature  of  that  person.  An  obvious  question 
is  "How  are  the  prominent  features  identified?" 

While  this  question  is  difficult  to  answer  quantatatively ,  one 
can  say  prominent  features  are  those  which  humans  identify  as  being 
substantially  different  from  some  internalized  norm.  By  performing 
a  series  of  mental  subtractions  from  a  norm,  humans  accomplish 
recognition. 

This  reasoning  is  the  basis  for  choosing  the  features 
{F[l] ,F[2] , . . . ,F[n] }  described  above.  If  the  image  I  is  a  solid 
circle,  it  is  seen  that  F[i]  =  1  for  i  =  l,2,...,n.  Any  point  in 
the  circle  would  have  a  Theta  Neighbor  for  any  value  of  theta  and 
N (theta [i])  would  simply  be  the  area  of  the  circle.  As  I  departs 
from  being  a  circle — the  network's  internalized  norm — the  values  of 
F[i]  measure  how  big/small  this  departure  has  become.  Values  of 
F[i]  substantially  different  from  1  or  very  close  to  1 
quantatatively  identify  "prominent"  features  of  I.  The  need  for  a 
neural  network  may  be  questioned  at  this  point  since  the  features 
we  have  described  are  invariant.  In  Reference  2  the  problems  with 
aliasing  and  j aggies  are  documented  when  each  target  has  only  one 
correct  form.  Handwritten  character  recognition  has  not  only  the 
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problems  of  aliasing  and  jaggies,  but  also  the  problem  of  a 
multitude  of  correct  forms  for  characters  or  words.  Neural 
networks  were  chosen  as  the  method  to  deal  with  these  problems. 

Previously,  Kohonen  networks  were  used  to  learn  the  extracted 
features.  While  the  results  were  good,  these  networks  suffer  from 
long  training  times  and  problems  when  new  examples  of  old  classes 
are  added  late  in  training.  For  these  reasons,  the  Cluster 
Euclidean  network  was  chosen  for  use  in  testing  and  training.  This 
network  is  descibed  by  Lippmann  in  Reference  3. 


TRAINING  AND  TESTING 

Due  to  the  many  correct  forms  of  handwritten  characters,  a 
definitive  test  is  difficult  to  devise.  To  establish  the  validity 
of  the  approach  taken  in  this  paper,  it  was  decided  to  test  the 
network  in  the  following  three  ways: 

1.  Single  characters — to  establish  the  capability  of 
the  network  to  distinguish  among  characters  having 

little  structure 

2.  Distinct  Words — to  establish  the  capability  of  the 
network  to  differentiate  among  nonsimilar  words 

3.  Similar  words — to  establish  the  capability  of  the 
network  to  differentiate  between  very  similar  words 

To  accomplish  training  and  testing,  a  program  was  written 
which  allowed  the  user  to  use  a  mouse  to  draw  characters 
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and  write  words  in  a  100  x  100  pixel  grid.  Two  features 
were  chosen  corresponding  to  theta  =  4  5  degrees  and 
theta  =  90  degrees. 


SINGLE  CHARACTERS 

In  the  single  characters  training,  the  user  was  told  to 

1.  write  legibly  and 

2.  give  the  network  5  examples  each  of  the  characters 
2,4,6,  and  8 . 

For  testing,  the  same  tester  was  told  to 

1.  write  legibly 

2.  show  the  network  the  sequence  2,4, 6,8  five  times  and  to 

record  the  responses. 

The  results  are  shown  in  the  following  table: 

Character 
2  4  6  8 

Number  Correct  4555 
Total  correct  19/20  =  95% 

The  only  incorrect  response  occurred  when  the  network 
classified  a  2  as  a  6.  This  is  understandable  when  one  considers 
the  invariance  of  the  features  and  the  fact  that  a  handwritten  6 
often  looks  like  a  mirror  image  of  a  handwritten  2. 
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DISTINCT  WORDS 

To  train  the  network  to  differentiate  between  distinct  words,  a 
tester  was  instructed  to 

1.  Write  legibly 

2.  Show  the  network  2  examples  each  of  the  "words"  KOHO,  ART 

I,  and  BPROP 

For  testing,  the  tester  was  instructed  to 

1.  Write  legibly 

2.  Repeat  the  sequence  KOHO,  ART  I,  and  BPROP  4  times  and 

record  the  results 

The  network  correctly  classified  all  16  words  it  was  presented 
during  the  test. 

SIMILAR  WORDS 

To  train  the  network  to  differentiate  between  similar  words,  the 
tester  was  instructed  to 

1.  write  legibly 

2.  Show  the  network  examples  of  the  words  "BELL"  and  "BiLL" 
until  it  appeared  the  network  was  capable  of 

making  a  distinction  between  them.  This  required 
approximately  20  examples  of  each  word. 

For  testing,  the  tester  was  instructed  to 
1.  Write  legibly 
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2.  Repeat  the  sequence  "BELL"  and  "BiLL"  6  times  and  record 
the  results. 

The  results  showed  that  a  correct  identification  was  made  in 
11  of  the  12  test  cases.  With  no  additional  training,  the  network 
was  asked  to  identify  the  words  "BELLs"  and  "BiLLs."  The  network 
correctly  identified  "BELLs"  as  being  most  similar  to  "BELL."  With 
one  more  training  example,  the  network  was  also  capable  of 
identifying  "BiLLs"  as  being  most  similar  to  "BiLL." 

The  overall  performance  of  the  network  is  summarized  in  the 
following  table. 


Training 

Examples 

Percent 

Correct 


Single 

Character 

5 

95 


Distinct 

Words 

2 

100 


Similar 

Words 

20 

92 


SUMMARY  AND  CONCLUSIONS 


The  results  presented  in  this  paper  show  great  promise  for  the 
use  of  Theta  Neighbors  and  neural  networks  for  handwritten 
character  recognition.  In  addition,  nothing  in  this  paper  was 
peculiar  to  handwritten  character  recognition.  Hence  the  approach 
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may  be  applied  to  the  problem  of  pattern  recognition  in  general. 
The  features  described  are  easily  calculated,  and  the  accompanying 
neural  network  is  easy  to  program  and  quick  to  train. 

By  using  the  features  described  in  this  paper  and  a  more 
complicated  neural  network,  the  authors  are  achieving  good  results 
on  the  problem  of  three-dimensional  target  recognition.  The  results 
of  this  work  should  appear  soon. 

The  authors  expect  to  continue  to  investigate  the  problem  of 
target  recognition  using  neural  networks  and  features  such  as  those 
described  here.  Emphasis  will  be  placed  on  partially  obscured 
targets  in  three  dimensions. 
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